[1] "species" "island" "bill_length_mm"
[4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
[7] "sex" "year"
Principal Data Scientist @ Jumping Rivers:
Project management.
Python & machine learning support for clients.
Teach courses in programming, SQL, ML.
Organise North East & Leeds data science meetups.
How I got into Data Science
First encounter with MLOps
Getting to grips using Vetiver (code examples)
MLOps tips & tricks
↗ jumpingrivers.com 𝕏 @jumping_uk
PhD in Astrophysics (started 2017)
Extra training in “Data Intensive Science”
… academia is hard
Joined Jumping Rivers full time in 2022 (following an internship)
My initial experience:
Software development (check out diffify.com)
Course writing and teaching
LOTS of merge requests
Conferences and meetups
The dreaded architecture diagram…
Countless permutations
Very multidisciplinary
Expensive
Palmer Penguin dataset
Using {tidyr} and {rsample}:
species:Convert our {tidymodels} model to a {vetiver} model:
Contains all the info needed to version, store and deploy our model!
Retrieve a model
Inspect the stored versions
We deploy models as APIs which take input data and send back model predictions.
APIs can be hosted at public endpoints on the web.
We can run them on the localhost (during testing / development).
{vetiver} uses {plumber} to create a model API.
Our Dockerfile contains a series of commands to:
Install the system libraries (Windows|Mac|Linux).
Set the R version and install the required R packages.
Run the API in the deployment environment.
As our data grows, run regular checks of model performance.
Monitor key model metrics over time using vetiver::vetiver_compute_metrics()
Store model metrics: vetiver::vetiver_pin_metrics()
Plot the metrics: vetiver::vetiver_plot_metrics()
Over time we may notice a drop in performance…
Vetiver is available for both Python and R!
In Python you would use Python ML libraries rather than {tidymodels}
Vetiver documentation: vetiver.posit.co
Life as a Data Scientist isn’t always about machine learning!
Architecture diagrams can be incredibly useful.
… but do consider your target audience!
You can get started on MLOps right now with free and open source tools.
Consider whether it is worth the cost/effort before investing in cloud infrastructure.